{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# `skweak`: a quick demonstration"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Start: preparing the corpus"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have a small corpus of 200 news articles that we wish to annotate with two entity types: \n",
    "- companies\n",
    "- other (non-commercial) organisations.\n",
    "\n",
    "The first step is to extract the texts from the corpus:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tarfile\n",
    "\n",
    "# We retrieve the texts\n",
    "texts = [] \n",
    "archive_file = tarfile.open(\"../data/reuters_small.tar.gz\")\n",
    "for archive_member in archive_file.getnames():\n",
    "    if archive_member.endswith(\".txt\"):\n",
    "        text = archive_file.extractfile(archive_member).read().decode(\"utf8\")\n",
    "        texts.append(text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now run Spacy on those texts to obtain `Doc` objects"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import spacy\n",
    "\n",
    "# We run spacy on the texts    \n",
    "nlp = spacy.load(\"en_core_web_sm\", disable=[\"ner\", \"lemmatizer\"])\n",
    "docs = list(nlp.pipe(texts))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>\n",
    "\n",
    "## Step 1: Labelling functions\n",
    "\n",
    "Labelling functions are at the core of `skweak`. They take a `Doc` as input and returns a list of spans with their associated labels. \n",
    "\n",
    "One simple type of labelling functions are heuristics. For instance, we can write that commercial companies may be recognized by their legal suffix (such as Corp.):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<span class=\"tex2jax_ignore\"><div class=\"entities\" style=\"line-height: 2.5; direction: ltr\">N.Y. judge restrains Merkin funds in Madoff lawsuit</br>NEW YORK (Reuters) - A judge extended an order on Tuesday barring hedge fund founder Ezra Merkin from shutting down funds that had invested with accused swindler Bernard Madoff or withdrawing money from them. New York State Supreme Court Justice Richard Lowe issued the extension in a lawsuit brought on December 23 by New York University, which says it lost $24 million when funds run by Merkin invested money with Madoff without its consent. Another judge issued the initial order on December 24 to stop Merkin from \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    liquidating Ariel Fund Ltd\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       ", named in the lawsuit by the university along with Merkin and \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    his Gabriel Capital Corp.\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       " The initial order expired on Tuesday. Madoff, an investment adviser, is accused of running a $50 billion securities fraud over many years. &quot;Until December 12, 2008, we had no knowledge that NYU's funds were instead being managed by Bernard Madoff,&quot; said an affidavit filed with the court by NYU chief investment officer Maurice Maertens. &quot;None of the documents we received throughout the years from Gabriel or Ariel ever stated that Mr Madoff was managing NYU's assets.&quot; Merkin's lawyer, Andrew Levander, said in a statement that his client &quot;has always acted in good faith and did not deceive NYU or any other investors.&quot; Merkin's personal losses from the purported fraud &quot;are in the many tens of millions of dollars...He shares the sorrow of all investors who have been cheated by Madoff,&quot; the statement said. NYU claims the title as the largest private U.S. university and is among several institutions and individual investors seeking to recover losses. Merkin is chairman of GMAC, the finance business owned by \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    General Motors Corp\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       " ( GM.N ) and private equity firm Cerberus Capital Management LP CBS.UL. The U.S. Treasury gave GMAC $5 billion from its $700 billion Troubled Asset Relief Program on December 29. Five days earlier, the Federal Reserve granted GMAC bank holding company status so it could get access to the bailout money. Merkin has also been sued in U.S. District Court in Manhattan for his management of Ascot Partners LLP, a fund he founded that lost an estimated $1.8 billion with Madoff. Madoff, 70, was arrested on December 11 and charged with securities fraud. He is under house arrest in his Manhattan apartment on $10 million bail. On Monday, U.S. prosecutors sought to jail him, saying he had mailed jewelry and other valuables to family and friends in violation of a court order freezing all his belongings. A judge has not yet issued a ruling on the government's request. The case is New York University v. \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Ariel Fund Ltd\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       " 08- 08603803 in New York State Supreme Court (Manhattan) </div></span>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import skweak\n",
    "\n",
    "def company_detector_fun(doc):\n",
    "    for chunk in doc.noun_chunks:\n",
    "        if chunk[-1].lower_.rstrip(\".\") in {'corp', 'inc', 'ltd', 'llc', 'sa', 'ag'}:\n",
    "            yield chunk.start, chunk.end, \"COMPANY\"\n",
    "\n",
    "# We create the labelling function by giving it a name, and a function to apply\n",
    "company_detector = skweak.heuristics.FunctionAnnotator(\"company_detector\", company_detector_fun)\n",
    "\n",
    "# We run the function on the full corpus\n",
    "docs = list(company_detector.pipe(docs))\n",
    "\n",
    "# Show an example\n",
    "skweak.utils.display_entities(docs[28], \"company_detector\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>\n",
    "For non-commercial organisations, we can also look for the occurrence of words that are quite typical of public organisations or NGOs: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<span class=\"tex2jax_ignore\"><div class=\"entities\" style=\"line-height: 2.5; direction: ltr\">N.Y. judge restrains Merkin funds in Madoff lawsuit</br>NEW YORK (Reuters) - A judge extended an order on Tuesday barring hedge fund founder Ezra Merkin from shutting down funds that had invested with accused swindler Bernard Madoff or withdrawing money from them. \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    New York State Supreme Court Justice Richard Lowe\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">OTHER_ORG</span>\n",
       "</mark>\n",
       " issued the extension in a lawsuit brought on December 23 by \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    New York University\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">OTHER_ORG</span>\n",
       "</mark>\n",
       ", which says it lost $24 million when funds run by Merkin invested money with Madoff without its consent. Another judge issued the initial order on December 24 to stop Merkin from liquidating Ariel Fund Ltd, named in the lawsuit by the university along with Merkin and his Gabriel Capital Corp. The initial order expired on Tuesday. Madoff, an investment adviser, is accused of running a $50 billion securities fraud over many years. &quot;Until December 12, 2008, we had no knowledge that NYU's funds were instead being managed by Bernard Madoff,&quot; said an affidavit filed with the court by NYU chief investment officer Maurice Maertens. &quot;None of the documents we received throughout the years from Gabriel or Ariel ever stated that Mr Madoff was managing NYU's assets.&quot; Merkin's lawyer, Andrew Levander, said in a statement that his client &quot;has always acted in good faith and did not deceive NYU or any other investors.&quot; Merkin's personal losses from the purported fraud &quot;are in the many tens of millions of dollars...He shares the sorrow of all investors who have been cheated by Madoff,&quot; the statement said. NYU claims the title as the largest private U.S. university and is among several institutions and individual investors seeking to recover losses. Merkin is chairman of GMAC, the finance business owned by General Motors Corp ( GM.N ) and private equity firm Cerberus Capital Management LP CBS.UL. The U.S. Treasury gave GMAC $5 billion from its $700 billion Troubled Asset Relief Program on December 29. Five days earlier, the Federal Reserve granted GMAC bank holding company status so it could get access to the bailout money. Merkin has also been sued in \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    U.S. District Court\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">OTHER_ORG</span>\n",
       "</mark>\n",
       " in Manhattan for his management of Ascot Partners LLP, a fund he founded that lost an estimated $1.8 billion with Madoff. Madoff, 70, was arrested on December 11 and charged with securities fraud. He is under house arrest in his Manhattan apartment on $10 million bail. On Monday, U.S. prosecutors sought to jail him, saying he had mailed jewelry and other valuables to family and friends in violation of a court order freezing all his belongings. A judge has not yet issued a ruling on the government's request. The case is \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    New York University\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">OTHER_ORG</span>\n",
       "</mark>\n",
       " v. Ariel Fund Ltd 08- 08603803 in \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    New York State Supreme Court\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">OTHER_ORG</span>\n",
       "</mark>\n",
       " (Manhattan) </div></span>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "OTHER_ORG_CUE_WORDS = {\"University\", \"Institute\", \"College\", \"Committee\", \"Party\", \"Agency\",\n",
    "                       \"Union\", \"Association\", \"Organization\", \"Court\", \"Office\", \"National\"}\n",
    "def other_org_detector_fun(doc):\n",
    "    for chunk in doc.noun_chunks:\n",
    "        if any([tok.text in OTHER_ORG_CUE_WORDS for tok in chunk]):\n",
    "            yield chunk.start, chunk.end, \"OTHER_ORG\"\n",
    "\n",
    "# We create the labelling function\n",
    "other_org_detector = skweak.heuristics.FunctionAnnotator(\"other_org_detector\", other_org_detector_fun)\n",
    "\n",
    "# We run the function on the full corpus\n",
    "docs = list(other_org_detector.pipe(docs))\n",
    "\n",
    "# Show an example\n",
    "skweak.utils.display_entities(docs[28], \"other_org_detector\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>\n",
    "In addition to heuristics, we can also exploit _gazetteers_ that search for the occurrences of entries (often extracted from a knowledge base): "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting data from ../data/crunchbase_companies.json.gz\n",
      "Populating trie for class COMPANY (number: 539174)\n",
      "done building the gazetteer\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<span class=\"tex2jax_ignore\"><div class=\"entities\" style=\"line-height: 2.5; direction: ltr\">N.Y. judge restrains Merkin funds in Madoff lawsuit</br>NEW YORK (\n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Reuters\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       ") - A judge extended an order on Tuesday barring hedge fund founder Ezra Merkin from shutting down funds that had invested with accused swindler Bernard Madoff or withdrawing money from them. New York State Supreme Court Justice Richard Lowe issued the extension in a lawsuit brought on December 23 by New York University, which says it lost $24 million when funds run by Merkin invested money with Madoff without its consent. Another judge issued the initial order on December 24 to stop Merkin from liquidating Ariel Fund Ltd, named in the lawsuit by the university along with Merkin and his Gabriel Capital Corp. The initial order expired on Tuesday. Madoff, an investment adviser, is accused of running a $50 billion securities fraud over many years. &quot;Until December 12, 2008, we had no knowledge that NYU's funds were instead being managed by Bernard Madoff,&quot; said an affidavit filed with the court by NYU chief investment officer Maurice Maertens. &quot;None of the documents we received throughout the years from \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Gabriel\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       " or \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Ariel\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       " ever stated that Mr Madoff was managing NYU's assets.&quot; Merkin's lawyer, Andrew Levander, said in a statement that his client &quot;has always acted in good faith and did not deceive NYU or any other investors.&quot; Merkin's personal losses from the purported fraud &quot;are in the many tens of millions of dollars...He shares the sorrow of all investors who have been cheated by Madoff,&quot; the statement said. NYU claims the title as the largest private U.S. university and is among several institutions and individual investors seeking to recover losses. Merkin is chairman of GMAC, the finance business owned by General Motors Corp ( GM.N ) and private equity firm Cerberus Capital Management LP CBS.UL. The U.S. Treasury gave GMAC $5 billion from its $700 billion Troubled Asset Relief Program on December 29. Five days earlier, the Federal Reserve granted GMAC bank holding company status so it could get access to the bailout money. Merkin has also been sued in \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    U.S. District Court\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       " in Manhattan for his management of Ascot Partners LLP, a fund he founded that lost an estimated $1.8 billion with Madoff. Madoff, 70, was arrested on December 11 and charged with securities fraud. He is under house arrest in his Manhattan apartment on $10 million bail. On Monday, U.S. prosecutors sought to jail him, saying he had mailed jewelry and other valuables to family and friends in violation of a court order freezing all his belongings. A judge has not yet issued a ruling on the government's request. The case is New York University v. Ariel Fund Ltd 08- 08603803 in New York State Supreme Court (Manhattan) </div></span>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "\n",
    "# We extract the entries (from Crunchbase)\n",
    "tries = skweak.gazetteers.extract_json_data(\"../data/crunchbase_companies.json.gz\")\n",
    "gazetteer = skweak.gazetteers.GazetteerAnnotator(\"gazetteer\", tries)\n",
    "print(\"done building the gazetteer\")\n",
    "\n",
    "# We run the function on the full corpus\n",
    "docs = list(gazetteer.pipe(docs))\n",
    "\n",
    "# Show an example\n",
    "skweak.utils.display_entities(docs[28], \"gazetteer\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>\n",
    "And finally, we can also take advantage of machine learning models trained from data of related domains. Here, we will use a spacy model to get the usual named entities:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<span class=\"tex2jax_ignore\"><div class=\"entities\" style=\"line-height: 2.5; direction: ltr\">\n",
       "<mark class=\"entity\" style=\"background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    N.Y.\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">GPE</span>\n",
       "</mark>\n",
       " judge restrains Merkin funds in \n",
       "<mark class=\"entity\" style=\"background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Madoff\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">GPE</span>\n",
       "</mark>\n",
       " lawsuit</br>\n",
       "<mark class=\"entity\" style=\"background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    NEW YORK\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">GPE</span>\n",
       "</mark>\n",
       " (\n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Reuters\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       ") - A judge extended an order on \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Tuesday\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       " barring hedge fund founder \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Ezra Merkin\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       " from shutting down funds that had invested with accused swindler \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Bernard Madoff\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       " or withdrawing money from them. \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    New York State Supreme Court\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " Justice \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Richard Lowe\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       " issued the extension in a lawsuit brought on \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    December 23\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       " by \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    New York University\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       ", which says it lost \n",
       "<mark class=\"entity\" style=\"background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    $24 million\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">MONEY</span>\n",
       "</mark>\n",
       " when funds run by \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Merkin\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " invested money with Madoff without its consent. Another judge issued the initial order on \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    December 24\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       " to stop Merkin from liquidating \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Ariel Fund Ltd\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       ", named in the lawsuit by the university along with Merkin and his \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Gabriel Capital Corp.\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " The initial order expired on \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Tuesday\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       ". Madoff, an investment adviser, is accused of running a \n",
       "<mark class=\"entity\" style=\"background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    $50 billion\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">MONEY</span>\n",
       "</mark>\n",
       " securities fraud over \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    many years\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       ". &quot;Until \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    December 12, 2008\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       ", we had no knowledge that \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    NYU\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       "'s funds were instead being managed by \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Bernard Madoff\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       ",&quot; said an affidavit filed with the court by \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    NYU\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " chief investment officer \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Maurice Maertens\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       ". &quot;None of the documents we received throughout \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    the years\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       " from \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Gabriel\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       " or \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Ariel\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       " ever stated that Mr Madoff was managing \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    NYU\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       "'s assets.&quot; Merkin's lawyer, \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Andrew Levander\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       ", said in a statement that his client &quot;has always acted in good faith and did not deceive \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    NYU\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " or any other investors.&quot; Merkin's personal losses from the purported fraud &quot;are in the many tens of millions of dollars...He shares the sorrow of all investors who have been cheated by \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Madoff\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       ",&quot; the statement said. NYU claims the title as the largest private \n",
       "<mark class=\"entity\" style=\"background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    U.S.\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">GPE</span>\n",
       "</mark>\n",
       " university and is among several institutions and individual investors seeking to recover losses. Merkin is chairman of \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    GMAC\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       ", the finance business owned by \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    General Motors Corp\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " ( GM.N ) and private equity firm \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Cerberus Capital Management LP CBS.UL\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       ". \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    The U.S. Treasury\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " gave \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    GMAC\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " \n",
       "<mark class=\"entity\" style=\"background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    $5 billion\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">MONEY</span>\n",
       "</mark>\n",
       " from its \n",
       "<mark class=\"entity\" style=\"background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    $700 billion\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">MONEY</span>\n",
       "</mark>\n",
       " \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Troubled Asset Relief Program\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " on \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    December 29\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       ". \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Five days earlier\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       ", \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    the Federal Reserve\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " granted \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    GMAC\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " bank holding company status so it could get access to the bailout money. Merkin has also been sued in \n",
       "<mark class=\"entity\" style=\"background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    U.S.\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">GPE</span>\n",
       "</mark>\n",
       " \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    District Court\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " in \n",
       "<mark class=\"entity\" style=\"background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Manhattan\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">GPE</span>\n",
       "</mark>\n",
       " for his management of \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Ascot Partners LLP\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       ", a fund he founded that lost \n",
       "<mark class=\"entity\" style=\"background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    an estimated $1.8 billion\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">MONEY</span>\n",
       "</mark>\n",
       " with Madoff. Madoff, \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    70\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       ", was arrested on \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    December 11\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       " and charged with securities fraud. He is under house arrest in his \n",
       "<mark class=\"entity\" style=\"background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Manhattan\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">GPE</span>\n",
       "</mark>\n",
       " apartment on \n",
       "<mark class=\"entity\" style=\"background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    $10 million\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">MONEY</span>\n",
       "</mark>\n",
       " bail. On \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Monday\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       ", \n",
       "<mark class=\"entity\" style=\"background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    U.S.\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">GPE</span>\n",
       "</mark>\n",
       " prosecutors sought to jail him, saying he had mailed jewelry and other valuables to family and friends in violation of a court order freezing all his belongings. A judge has not yet issued a ruling on the government's request. The case is \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    New York University\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " v. \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Ariel Fund Ltd\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " 08- 08603803 in \n",
       "<mark class=\"entity\" style=\"background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    New York State Supreme Court\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">GPE</span>\n",
       "</mark>\n",
       " (\n",
       "<mark class=\"entity\" style=\"background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Manhattan\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">GPE</span>\n",
       "</mark>\n",
       ") </div></span>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "\n",
    "# Run a NER model trained on OntoNotes 5.0\n",
    "ner = skweak.spacy.ModelAnnotator(\"spacy\", \"en_core_web_sm\")\n",
    "docs = list(ner.pipe(docs))\n",
    "\n",
    "# Show an example\n",
    "skweak.utils.display_entities(docs[28], \"spacy\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br> \n",
    "\n",
    "## Step 2: aggregation\n",
    "\n",
    "Once the labelling functions have been applied, we must then aggregate their results, to get a single annotation for each document. This is done in `skweak` by estimating a generative model. Aggregating the labels can be done in a few lines of code: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Starting iteration 1\n",
      "Finished E-step with 195 documents\n",
      "Starting iteration 2\n",
      "         1      -39106.4420             +nan\n",
      "Finished E-step with 195 documents\n",
      "Starting iteration 3\n",
      "         2      -39020.9787         +85.4633\n",
      "Finished E-step with 195 documents\n",
      "Starting iteration 4\n",
      "         3      -39007.4458         +13.5329\n",
      "Finished E-step with 195 documents\n",
      "         4      -39005.8610          +1.5848\n"
     ]
    }
   ],
   "source": [
    "# We define the aggregation model\n",
    "model = skweak.aggregation.HMM(\"hmm\", [\"COMPANY\", \"OTHER_ORG\"])\n",
    "\n",
    "# We indicate that \"ORG\" is an underspecified value, which may\n",
    "# represent either COMPANY or OTHER_ORG\n",
    "model.add_underspecified_label(\"ORG\", [\"COMPANY\", \"OTHER_ORG\"])\n",
    "\n",
    "# And run the estimation\n",
    "docs = model.fit_and_aggregate(docs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<span class=\"tex2jax_ignore\"><style>\n",
       ".tooltip {  position: relative;  border-bottom: 1px dotted black; }\n",
       ".tooltip .tooltip-text {visibility: hidden;  background-color: black;  color: white;\n",
       "                        line-height: 1.2;  text-align: right;  border-radius: 6px;\n",
       "                        padding: 5px 0; position: absolute; z-index: 1; margin-left:1em;\n",
       "                        opacity: 0; transition: opacity 1s;}\n",
       ".tooltip .tooltip-text::after {position: absolute; top: 1.5em; right: 100%; margin-top: -5px;\n",
       "                               border-width: 5px; border-style: solid; \n",
       "                               border-color: transparent black transparent transparent;}\n",
       ".tooltip:hover .tooltip-text {visibility: visible; opacity: 1;}\n",
       "</style>\n",
       "<div class=\"entities\" style=\"line-height: 2.5; direction: ltr\"><label class='tooltip'>N.Y.<span class='tooltip-text' style='width:147px'>spacy:\tGPE&nbsp;&nbsp</span></label> judge restrains Merkin funds in <label class='tooltip'>Madoff<span class='tooltip-text' style='width:147px'>spacy:\tGPE&nbsp;&nbsp</span></label> lawsuit</br><label class='tooltip'>NEW<span class='tooltip-text' style='width:147px'>spacy:\tGPE&nbsp;&nbsp</span></label> <label class='tooltip'>YORK<span class='tooltip-text' style='width:147px'>spacy:\tGPE&nbsp;&nbsp</span></label> (\n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    <label class='tooltip'>Reuters<span class='tooltip-text' style='width:203px'>gazetteer:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label>\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       ") - A judge extended an order on <label class='tooltip'>Tuesday<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> barring hedge fund founder <label class='tooltip'>Ezra<span class='tooltip-text' style='width:168px'>spacy:\tPERSON&nbsp;&nbsp</span></label> <label class='tooltip'>Merkin<span class='tooltip-text' style='width:168px'>spacy:\tPERSON&nbsp;&nbsp</span></label> from shutting down funds that had invested with accused swindler <label class='tooltip'>Bernard<span class='tooltip-text' style='width:168px'>spacy:\tPERSON&nbsp;&nbsp</span></label> <label class='tooltip'>Madoff<span class='tooltip-text' style='width:168px'>spacy:\tPERSON&nbsp;&nbsp</span></label> or withdrawing money from them. \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    <label class='tooltip'>New<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>York<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>State<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Supreme<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Court<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Justice<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp</span></label> <label class='tooltip'>Richard<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tPERSON&nbsp;&nbsp</span></label> <label class='tooltip'>Lowe<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tPERSON&nbsp;&nbsp</span></label>\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">OTHER_ORG</span>\n",
       "</mark>\n",
       " issued the extension in a lawsuit brought on <label class='tooltip'>December<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> <label class='tooltip'>23<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> by \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    <label class='tooltip'>New<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>York<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>University<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label>\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">OTHER_ORG</span>\n",
       "</mark>\n",
       ", which says it lost <label class='tooltip'>$<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label><label class='tooltip'>24<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> <label class='tooltip'>million<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> when funds run by <label class='tooltip'>Merkin<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> invested money with Madoff without its consent. Another judge issued the initial order on <label class='tooltip'>December<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> <label class='tooltip'>24<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> to stop Merkin from \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    <label class='tooltip'>liquidating<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp</span></label> <label class='tooltip'>Ariel<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Fund<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Ltd<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label>\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       ", named in the lawsuit by the university along with Merkin and \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    <label class='tooltip'>his<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp</span></label> <label class='tooltip'>Gabriel<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Capital<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Corp.<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label>\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       " The initial order expired on <label class='tooltip'>Tuesday<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label>. Madoff, an investment adviser, is accused of running a <label class='tooltip'>$<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label><label class='tooltip'>50<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> <label class='tooltip'>billion<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> securities fraud over <label class='tooltip'>many<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> <label class='tooltip'>years<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label>. &quot;Until <label class='tooltip'>December<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> <label class='tooltip'>12<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label><label class='tooltip'>,<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> <label class='tooltip'>2008<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label>, we had no knowledge that <label class='tooltip'>NYU<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label>'s funds were instead being managed by <label class='tooltip'>Bernard<span class='tooltip-text' style='width:168px'>spacy:\tPERSON&nbsp;&nbsp</span></label> <label class='tooltip'>Madoff<span class='tooltip-text' style='width:168px'>spacy:\tPERSON&nbsp;&nbsp</span></label>,&quot; said an affidavit filed with the court by <label class='tooltip'>NYU<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> chief investment officer <label class='tooltip'>Maurice<span class='tooltip-text' style='width:168px'>spacy:\tPERSON&nbsp;&nbsp</span></label> <label class='tooltip'>Maertens<span class='tooltip-text' style='width:168px'>spacy:\tPERSON&nbsp;&nbsp</span></label>. &quot;None of the documents we received throughout <label class='tooltip'>the<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> <label class='tooltip'>years<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> from \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    <label class='tooltip'>Gabriel<span class='tooltip-text' style='width:203px'>gazetteer:\tCOMPANY&nbsp;&nbsp<br>spacy:\tPERSON&nbsp;&nbsp</span></label>\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       " or \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    <label class='tooltip'>Ariel<span class='tooltip-text' style='width:203px'>gazetteer:\tCOMPANY&nbsp;&nbsp<br>spacy:\tPERSON&nbsp;&nbsp</span></label>\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       " ever stated that Mr Madoff was managing <label class='tooltip'>NYU<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label>'s assets.&quot; Merkin's lawyer, <label class='tooltip'>Andrew<span class='tooltip-text' style='width:168px'>spacy:\tPERSON&nbsp;&nbsp</span></label> <label class='tooltip'>Levander<span class='tooltip-text' style='width:168px'>spacy:\tPERSON&nbsp;&nbsp</span></label>, said in a statement that his client &quot;has always acted in good faith and did not deceive <label class='tooltip'>NYU<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> or any other investors.&quot; Merkin's personal losses from the purported fraud &quot;are in the many tens of millions of dollars...He shares the sorrow of all investors who have been cheated by <label class='tooltip'>Madoff<span class='tooltip-text' style='width:168px'>spacy:\tPERSON&nbsp;&nbsp</span></label>,&quot; the statement said. NYU claims the title as the largest private <label class='tooltip'>U.S.<span class='tooltip-text' style='width:147px'>spacy:\tGPE&nbsp;&nbsp</span></label> university and is among several institutions and individual investors seeking to recover losses. Merkin is chairman of <label class='tooltip'>GMAC<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label>, the finance business owned by \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    <label class='tooltip'>General<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Motors<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Corp<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label>\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       " ( GM.N ) and private equity firm <label class='tooltip'>Cerberus<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Capital<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Management<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>LP<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>CBS.UL<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label>. <label class='tooltip'>The<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>U.S.<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Treasury<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> gave <label class='tooltip'>GMAC<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>$<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label><label class='tooltip'>5<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> <label class='tooltip'>billion<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> from its <label class='tooltip'>$<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label><label class='tooltip'>700<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> <label class='tooltip'>billion<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> <label class='tooltip'>Troubled<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Asset<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Relief<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Program<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> on <label class='tooltip'>December<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> <label class='tooltip'>29<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label>. <label class='tooltip'>Five<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> <label class='tooltip'>days<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> <label class='tooltip'>earlier<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label>, <label class='tooltip'>the<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Federal<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Reserve<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> granted <label class='tooltip'>GMAC<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> bank holding company status so it could get access to the bailout money. Merkin has also been sued in \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    <label class='tooltip'>U.S.<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>gazetteer:\tCOMPANY&nbsp;&nbsp<br>spacy:\tGPE&nbsp;&nbsp</span></label> <label class='tooltip'>District<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>gazetteer:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Court<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>gazetteer:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label>\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">OTHER_ORG</span>\n",
       "</mark>\n",
       " in <label class='tooltip'>Manhattan<span class='tooltip-text' style='width:147px'>spacy:\tGPE&nbsp;&nbsp</span></label> for his management of <label class='tooltip'>Ascot<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Partners<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>LLP<span class='tooltip-text' style='width:147px'>spacy:\tORG&nbsp;&nbsp</span></label>, a fund he founded that lost <label class='tooltip'>an<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> <label class='tooltip'>estimated<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> <label class='tooltip'>$<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label><label class='tooltip'>1.8<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> <label class='tooltip'>billion<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> with Madoff. Madoff, <label class='tooltip'>70<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label>, was arrested on <label class='tooltip'>December<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> <label class='tooltip'>11<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label> and charged with securities fraud. He is under house arrest in his <label class='tooltip'>Manhattan<span class='tooltip-text' style='width:147px'>spacy:\tGPE&nbsp;&nbsp</span></label> apartment on <label class='tooltip'>$<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label><label class='tooltip'>10<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> <label class='tooltip'>million<span class='tooltip-text' style='width:161px'>spacy:\tMONEY&nbsp;&nbsp</span></label> bail. On <label class='tooltip'>Monday<span class='tooltip-text' style='width:154px'>spacy:\tDATE&nbsp;&nbsp</span></label>, <label class='tooltip'>U.S.<span class='tooltip-text' style='width:147px'>spacy:\tGPE&nbsp;&nbsp</span></label> prosecutors sought to jail him, saying he had mailed jewelry and other valuables to family and friends in violation of a court order freezing all his belongings. A judge has not yet issued a ruling on the government's request. The case is \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    <label class='tooltip'>New<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>York<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>University<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label>\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">OTHER_ORG</span>\n",
       "</mark>\n",
       " v. \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    <label class='tooltip'>Ariel<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Fund<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label> <label class='tooltip'>Ltd<span class='tooltip-text' style='width:252px'>company_detector:\tCOMPANY&nbsp;&nbsp<br>spacy:\tORG&nbsp;&nbsp</span></label>\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">COMPANY</span>\n",
       "</mark>\n",
       " 08- 08603803 in \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    <label class='tooltip'>New<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tGPE&nbsp;&nbsp</span></label> <label class='tooltip'>York<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tGPE&nbsp;&nbsp</span></label> <label class='tooltip'>State<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tGPE&nbsp;&nbsp</span></label> <label class='tooltip'>Supreme<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tGPE&nbsp;&nbsp</span></label> <label class='tooltip'>Court<span class='tooltip-text' style='width:280px'>other_org_detector:\tOTHER_ORG&nbsp;&nbsp<br>spacy:\tGPE&nbsp;&nbsp</span></label>\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">OTHER_ORG</span>\n",
       "</mark>\n",
       " (<label class='tooltip'>Manhattan<span class='tooltip-text' style='width:147px'>spacy:\tGPE&nbsp;&nbsp</span></label>) </div></span>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Note: if you are running Jupyter Notebook instead of Jupyter Lab, you need to \n",
    "# set add_tooltip=False, as Juypter Notebook does not support HTML tooltips\n",
    "skweak.utils.display_entities(docs[28], \"hmm\", add_tooltip=True) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>\n",
    "\n",
    "## Step 3: Training the final model\n",
    "    \n",
    "Once we have finished labelling the corpus, we can then train any type of machine learning model on it!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Write to ../data/reuters_small.spacy...done\n"
     ]
    }
   ],
   "source": [
    "for doc in docs:\n",
    "    doc.ents = doc.spans[\"hmm\"]\n",
    "skweak.utils.docbin_writer(docs, \"../data/reuters_small.spacy\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[38;5;4mℹ Using CPU\u001b[0m\n",
      "\u001b[1m\n",
      "=========================== Initializing pipeline ===========================\u001b[0m\n",
      "[2021-04-29 12:16:56,425] [INFO] Set up nlp object from config\n",
      "[2021-04-29 12:16:56,437] [INFO] Pipeline: ['tok2vec', 'ner']\n",
      "[2021-04-29 12:16:56,442] [INFO] Created vocabulary\n",
      "[2021-04-29 12:16:59,300] [INFO] Added vectors: en_core_web_md\n",
      "[2021-04-29 12:16:59,301] [INFO] Finished initializing nlp object\n",
      "[2021-04-29 12:17:09,940] [INFO] Initialized pipeline components: ['tok2vec', 'ner']\n",
      "\u001b[38;5;2m✔ Initialized pipeline\u001b[0m\n",
      "\u001b[1m\n",
      "============================= Training pipeline =============================\u001b[0m\n",
      "\u001b[38;5;4mℹ Pipeline: ['tok2vec', 'ner']\u001b[0m\n",
      "\u001b[38;5;4mℹ Initial learn rate: 0.001\u001b[0m\n",
      "E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE \n",
      "---  ------  ------------  --------  ------  ------  ------  ------\n",
      "  0       0          0.00     85.00    0.31    0.25    0.41    0.00\n",
      "  1     200        258.61   5153.69   75.68   73.13   78.42    0.76\n",
      "^C\n"
     ]
    }
   ],
   "source": [
    "!spacy init config - --lang en --pipeline ner --optimize accuracy | \\\n",
    "spacy train - --paths.train ../data/reuters_small.spacy  --paths.dev ../data/reuters_small.spacy \\\n",
    "--initialize.vectors en_core_web_md --output ../data/reuters_small\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is of course just a very short example. Please look at domain-specific Jupyter notebooks in `examples/ner` and `examples/sentiment` directories for more details."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.8 64-bit ('base': conda)",
   "name": "python388jvsc74a57bd030d45502beca8fd3221ccf172fddc768e31e1ebd376d84bb809a361701964046"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
